Abstract: QDMiner aims to offer the opportunity of finding the main points of multiple documents and thus save users’ time on reading whole documents. The difference is that most existing summarization systems dedicate themselves to generating summaries using sentences extracted from documents. In addition, return multiple groups of semantically related items, while they return a flat list of sentences. However, the relative importance of this side-information may be difficult to estimate, especially when some of the information is noisy. In such cases, it can be risky to integrate side-information into the mining process, because it can either improve the quality of the representation for the mining process, or can add noise to the process. Therefore, a principled way is required to perform the mining process, so as to maximize the advantages from using this side information. This project designs an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach.

Keywords: DQMiner, partitioning algorithms, summarization.